perm filename IDEA.DNA[RDG,DBL] blob sn#600186 filedate 1981-07-20 generic text, type C, neo UTF8
COMMENT āŠ—   VALID 00008 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	Data base should contain:
C00005 00003	Tasks:
C00006 00004	Methods:
C00007 00005	Pie in the Sky ideas:
C00009 00006	Further thoughts & Criticisms
C00010 00007		New Ideas - 17-June
C00013 00008	Trouble with proposed evolutionary task:
C00017 ENDMK
CāŠ—;
Data base should contain:
	Known global transformations of DNA over evolution 
	  Details: Which sequences transformed into what
		   What this meant, phenotypically
		   Possible path towards this end (ie what local metamorphises)
		   Time required for each step en route
		   Possible global (overall environmetal) causes
	Known local transformations of DNA
	  Details: Which sequences transformed into what
		   If this might imply any phenotypical change
		   Time required for this
		   Possible specific causes (ie pH, or some enzyme) - Mechanism
**** Probably more levels than just these two ****
	Unexplained transformations - before/after pairs of sequences
	Expected transformations - before/after pairs of sequence-categories,
		   with Mechanism(s)

	Examples: These tie into above transformations
	Mechanisms: These tie into above transformations

Facilities:
	Search for similarities between two before/after pairs
	Attempt to explain a before/after pair in terms of a set of mechanisms
	Differencing routines -- to describe similarities/differences between a
	  pair of sequences
	  It should know which are significant - ie generate a different AA
	  Perhaps it should use more global information - if "AA-1 AA-2" is 
	  functionally quite analogous to "AA-1 AA-3", the bp change changing
	  AA-2 to AA-3 may not be considered that relevant, and so on.
	Rules which determine [most likely] significant characteristics of a sequence
	  (see above for defination of significant)
	Clustering facility - to note rules of the form change-like-X must be 
	  accompanied with change-like-Y, when the "likes" are similar.

Primitives:
	Transformation rules for CS language, capable of full bread-first search
Tasks:
	[All use Database and facilities defined above]

1) Given:
	DNA sequences A and B 
   Deduce:
	Plausible mechanism(s) for deriving B from A,
	   [using transformations given in DB, or appropriate analogs]
	Answer should include assumptions which had to be fulfilled en route

2) Given:
	DNA sequences A and C
   Deduce:
	A DNA sequences B s.t.
	āˆƒ plausible mechanism(s) for deriving A from B, and C from B
	   noting where B-A line split from B-C line
	   [using procedure defined above]
	Answer should include assumptions which had to be fulfilled en route,
	   and possibly cause for bifircation
Methods:

For  Task#1:
	Deduce similarities and differences between A and B.
	For each difference, determine set of plausible mechanisms
	Attempt to find some consistent set from (ā†‘), which mutually require
	 similar preconditions, and satisfy all global constraints.
	 Here one may be forced to suggest a transformation which is analogous
	 to some known change.

For  Task#2:
	Deduce similarities and differences between A and C
	Attempt to explain differences in terms of some sub-part of (proposed) B
	 which might lead to both A and C.
	(Set Task#1 on A - B, and C - B).
Pie in the Sky ideas:

Be able to produce rules for Context Sensitive Language,
	derived from examples shown
Design an (General Purpose) Analog Routines: Finds x s.t. A:B :: C:x . 
	(given A,B,C)
	This should be able to apply to, say sequences A,B,C; and to concepts,
	and mechanisms, and so on.

Interface with Stefik's system:
	When mechanism is acceptable, design an experiment to investigate it.
	It could be used to decide which of 2 schemes seem more tractable
		[By letting his algorithm decide which to use]

Measure of chance this might happen:
	a priori - based on (1) structure of DNA so far
			    (2) likelihood of this alteration, in general
	a posteriori - based on viability of the resultant organism -
		[this should consider pathway, and check each intermediate]

	Factors should include:
	 Rate of mutation is feasible, both in DNA sequence & phenotypical changes
Further thoughts & Criticisms
1) DBL: criteria for viability - both overall, & at each stage throughout
2) STT: Method for deciding on strategy - ala Means-Ends Analysis,
	to attempt to decide in which manner to attempt to solve this puzzle
3) 	Some criteria for difficulty - perhaps some meaningful measure
	Perhaps tied with a metric between sequences, (or transformation-types)
	New Ideas - 17-June
1. Biochemistry related
  a) Propose (and "verify") a better mechanism than (Turing-machine-like) Ribosome
	-- for going from DNA to protein (or other, better effector?)
	[Consider this mechanism to be a local maximum, sufficient for any given
	task, but sub-optimal.]
  b) Hypothesize a general purpose bacteria, which can "eat" anything [by digesting
	whatever is most available... has many local matchers, which trigger things
	which might digest it. Store result of this, together with trials, ...
	[This essentially experiments, ala MOLGEN - but in real life.]
  c) Propose slew of mechanisms which might account for ... and show how easily this
	accounts for this effect.

  d) For simulating: Modelling vs Reasoning [note RWW SIMULATION structures]

2. Large number of (small caches of) domain facts -- then try to do analogy.

3. Programming - perhaps better formalization -- see MRG

4. Music ? - patterns on patterns. (small deciphering task for RLL, to determine
	whether Mozart or Bach wrote X.) Generation? Levels of characteristics...


Talk with Stefik, Brutlag 
  1) What knowledge is available, and in what form.
	Ala Rheumatic DB?; library of genes (including mechanisms?)
  2) What are non-trivial tasks, for molecular biology? What are the good
	theses?
Trouble with proposed evolutionary task:
First, a digression, to explore the possibilities which would arise
from the following task:

You have been given n sequences of characters, and asked to find how closely
they are related, in pairs.
(Unbeknownst to you, these are Shakespearean plays.)

By blind search alone you may find "syntactic" matches - ie a particular
string of symbols seems to occur in some vaguely described context, with
a high frequency.  You might even discover that the most common symbol,
" ", appears, on the average, every 5 or so symbols, with a relatively
restricted range -- from a seperation of 1 to a maximum of 15.

You might also find characters, such as "." or ",", which seem to occur
only immediately before a " ", and never before one another, or that blank.
One could soon find select prefixes and suffixes (such as "ing" or "s");
and by subtracting these out, certain morphological roots may become apparent.

Given enough cycles, I'll even believe one could find parts of speech - eg
this "word" (now defined, using the delimiters shown about) is a noun, and can
therefore occur in the following places; analogously this verb can be modified
in this way.

The point to this discussion about is that this is ALL we'll get, given this
investigative framework.  Lexical and syntactic notions could be derived, 
but nothing with "semantic" content. 
It might deduce that Macbeth is more closely related to Henry IV, Part 2
than to the Merry Wives of Winsor because the tradegies/histories have more
occurances of words like "kill" - or it might be quite content to simply
observe that Falstaff and a few of his cronies seem to occur in HIV & MWW,
and reach the opposite (and here, it turns out, correct) conclusion that
HIV and MWW had a common root, in Shakespeare's mind.

Clearly this matcher would be incredibly more powerful if it had some
knowledge of, for example, scripts or plans - as well as an elaborate
lexicon relating (at minimum) synonyms to one another -- an more complete
semantic net would be yet even more useful.

Things at this level, in the field of Molecular Genetics, would be things
like chemical pathways, or various levels of functionality of the eventual
proteins, or ... things like coding region endmarks, while deducible, would
be useful - match this with " ".